160 research outputs found
Distributed Queuing in Dynamic Networks
We consider the problem of forming a distributed queue in the adversarial
dynamic network model of Kuhn, Lynch, and Oshman (STOC 2010) in which the
network topology changes from round to round but the network stays connected.
This is a synchronous model in which network nodes are assumed to be fixed, the
communication links for each round are chosen by an adversary, and nodes do not
know who their neighbors are for the current round before they broadcast their
messages. Queue requests may arrive over rounds at arbitrary nodes and the goal
is to eventually enqueue them in a distributed queue. We present two algorithms
that give a total distributed ordering of queue requests in this model. We
measure the performance of our algorithms through round complexity, which is
the total number of rounds needed to solve the distributed queuing problem. We
show that in 1-interval connected graphs, where the communication links change
arbitrarily between every round, it is possible to solve the distributed
queueing problem in O(nk) rounds using O(log n) size messages, where n is the
number of nodes in the network and k <= n is the number of queue requests.
Further, we show that for more stable graphs, e.g. T-interval connected graphs
where the communication links change in every T rounds, the distributed queuing
problem can be solved in O(n+ (nk/min(alpha,T))) rounds using the same O(log n)
size messages, where alpha > 0 is the concurrency level parameter that captures
the minimum number of active queue requests in the system in any round. These
results hold in any arbitrary (sequential, one-shot concurrent, or dynamic)
arrival of k queue requests in the system. Moreover, our algorithms ensure
correctness in the sense that each queue request is eventually enqueued in the
distributed queue after it is issued and each queue request is enqueued exactly
once. We also provide an impossibility result for this distributed queuing
problem in this model. To the best of our knowledge, these are the first
solutions to the distributed queuing problem in adversarial dynamic networks.Comment: In Proceedings FOMC 2013, arXiv:1310.459
Concurrent counting is harder than queuing
We compare the complexities of two fundamental distributed coordination problems, distributed counting and distributed queuing, in a concurrent setting. In both distributed counting and queuing, processors in a distributed system issue operations which are organized into a total order. In counting, each participating processor receives the rank of its operation in the total order, where as in queuing, a processor receives the identity of its predecessor in the total order. Many coordination applications can be solved using either distributed counting or queuing, and it is useful to know which of counting or queuing is the easier problem. Our results show that concurrent counting is harder than concurrent queuing on a variety of processor interconnection topologies, including high and low diameter graphs. For all these topologies, we show that the concurrent delay complexity of a particular solution to queuing, the arrow protocol, is asymptotically smaller than a lower bound on the complexity of any solution to counting
Window Based BFT Blockchain Consensus
There is surge of interest to the blockchain technology not only in the
scientific community but in the business community as well. Proof of Work (PoW)
and Byzantine Fault Tolerant (BFT) are the two main classes of consensus
protocols that are used in the blockchain consensus layer. PoW is highly
scalable but very slow with about 7 (transactions/second) performance. BFT
based protocols are highly efficient but their scalability are limited to only
tens of nodes. One of the main reasons for the BFT limitation is the quadratic
communication complexity of BFT based protocols for nodes that
requires broadcasting. In this paper, we present the {\em Musch}
protocol which is BFT based and provides communication complexity
for failures and nodes, where , without compromising the
latency. Hence, the performance adjusts to such that for constant the
communication complexity is linear. Musch achieves this by introducing the
notion of exponentially increasing windows of nodes to which complains are
reported, instead of broadcasting to all the nodes. To our knowledge, this is
the first BFT-based blockchain protocol which efficiently addresses
simultaneously the issues of communication complexity and latency under the
presence of failures.Comment: 2018 IEEE International Conference on Internet of Things (iThings)
and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber,
Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData
BifurKTM: Approximately Consistent Distributed Transactional Memory for GPUs
We present BifurKTM, the first read-optimized Distributed Transactional Memory system for GPU clusters. The BifurKTM design includes: GPU KoSTM, a new software transactional memory conflict detection scheme that exploits relaxed consistency to increase throughput; and KoDTM, a Distributed Transactional Memory model that combines the Data- and Control- flow models to greatly reduce communication overheads.
Despite the allure of huge speedups, GPUs are limited in use due to their programmability and extreme sensitivity to workload characteristics. These become daunting concerns when considering a distributed GPU cluster, wherein a programmer must design algorithms to hide communication latency by exploiting data regularity, high compute intensity, etc. The BifurKTM design allows GPU programmers to exploit a new workload characteristic: the percentage of the workload that is Read-Only (e.g. reads but does not modify shared memory), even when this percentage is not known in advance. Programmers designate transactions that are suitable for Approximate Consistency, in which transactions "appear" to execute at the most convenient time for preventing conflicts. By leveraging Approximate Consistency for Read-Only transactions, the BifurKTM runtime system offers improved performance, application flexibility, and programmability without introducing any errors into shared memory.
Our experiments show that Approximate Consistency can improve BkTM performance by up to 34x in applications with moderate network communication utilization and a read-intensive workload. Using Approximate Consistency, BkTM can reduce GPU-to-GPU network communication by 99%, reduce the number of aborts by up to 100%, and achieve an average speedup of 18x over a similarly sized CPU cluster while requiring minimal effort from the programmer
- …